Voice Recognition with VOSK

24th January 2021 0 By Neil Stevenson

Installing VOSK

Installing VOSK is surprisingly simple if you’re using Python3 and the latest Raspberry Pi OS. This is pretty much all we had to do:

#Python:
pip3 install vosk
#Dependencies:
sudo apt install libgfortran3
#Examples
pip3 install pyaudio
sudo apt install libportaudio2
git clone https://github.com/alphacep/vosk-api
#Download and unzip a model file, e.g. the small US-English model:
http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip

Then unzip the downloaded model ZIP file into the example folder and test:

#Unzip the model file and rename as “model” e.g.:
cd vosk-api/python/example/
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 model
#Ensure you have a working microphone configured, you can check by using:
arecord | aplay
#Run the test code
./test_microphone.py

Bluetooth Headset

Many Bluetooth headsets do not automatically add the microphone into the audio configuration file, so not immediately usable. If this is the case for your headset, then after selecting the headset in the audio menu:

  • Edit .asoundrc, for example:
    nano ~/.asoundrc
  • Change profile to “sco”
  • Add the pcm.input sections by copying the pcm.output
  • Add the capture.pcm section
  • Save
  • Test by using:
    arecord | aplay

For example:

pcm.!default {
        type asym
        playback.pcm {
                type plug
                slave.pcm "output"
        }
        capture.pcm {
                type plug
                slave.pcm "input"
        }
}

pcm.output {
        type bluealsa
        device "2E:A1:C2:2C:B8:84"
        profile "sco"
}

pcm.input {
        type bluealsa
        device "2E:A1:C2:2C:B8:84"
        profile "sco"
}

ctl.!default {
        type bluealsa
}

Robot Test using Python

In order to test out the VOSK speech recognition on Chameleon, we had a very simple algorithm: wait for a voice command, then move for 0.5 second, stop and repeat.

Here’s our full test code:

from vosk import Model, KaldiRecognizer
import pyaudio
import sys
import json

print("Loading model...")
model = Model("model")
rec = KaldiRecognizer(model, 16000)
print("Opening stream...")
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=4000)
stream.start_stream()

print("Ready")
last_text = "x"
while True:
    data = stream.read(4000, False)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        result_json = rec.Result()
        text = json.loads(result_json)['text']
    else:
        result_json = rec.PartialResult()
        text = json.loads(result_json)['partial']
    print(result_json)

    if last_text != text:
        # Adjust the motors
        move_robot(text)

import redboard
import time
# RedBoard motor control
rb = redboard.RedBoard()
def left():
    rb._set_motor_speed(0, 0.0)
    rb._set_motor_speed(1, 0.2)
    time.sleep(0.5)
    stop()
    
def right():
    rb._set_motor_speed(0, -0.2)
    rb._set_motor_speed(1, 0.0)
    time.sleep(0.5)
    stop()
    
def ahead():
    rb._set_motor_speed(0, -0.15)
    rb._set_motor_speed(1, 0.15)
    time.sleep(1.0)
    rb._set_motor_speed(0, -0.02)
    rb._set_motor_speed(1, 0.02)
    
def reverse():
    rb._set_motor_speed(0, 0.15)
    rb._set_motor_speed(1, -0.15)
    time.sleep(0.5)
    stop()
        
def stop():
    rb._set_motor_speed(0, 0.0)
    rb._set_motor_speed(1, 0.0)
    
def move_robot(cmd):
    commands = { 
        'left': left,
        'right': right,
        'go': ahead,
        'reverse': reverse,
        'stop': stop,
        'top': stop
        }
    commands.get(cmd, lambda: None)()

Test Run

First test run

Not bad! Only 1 minute 28 seconds. How fast can we now make it without stopping after each stage?